A curated dataset of complete Enterobacteriaceae plasmids compiled from the NCBI nucleotide database
نویسندگان
چکیده
Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: "Ordering the mob: insights into replicon and MOB typing…" (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.
منابع مشابه
RefSeq Frequently Asked Questions (FAQ)
The NCBI Reference Sequence (RefSeq) project provides sequence records and related information for numerous organisms, and provides a baseline for medical, functional, and comparative studies. Whereas the International Nucleotide Sequence Database Collaboration (INSDC, made up of GenBank, the European Nucleotide Archive, and the DNA Data Bank of Japan) represents an archival repository of all s...
متن کاملNCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes s...
متن کاملIn silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing.
In the work presented here, we designed and developed two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae. These tools will facilitate bacterial typing based on draft genomes of multidrug-resistant Enterobacteriaceae species by the rapid detection of known plasmid type...
متن کاملAbout Viral and Phage Genome Processing and Tools
The National Center for Biotechnology Information (NCBI) Viral Genome Resource hosts all virus-related data and tools. All complete viral genome sequences deposited in the International Nucleotide Sequence Database Collaboration (INSDC) databases are collected by the NCBI Viral Genome Project (1). A RefSeq record is created from one of the complete genome sequences for each virus species, and t...
متن کاملIsolation, Cloning and Sequence Analysis of 1-Aminocyclopropane-1-Carboxylate Deaminase Gene from Native Sinorhizobium meliloti
Background: Many plant growth-promoting bacteria including Rhizobia contain the 1-aminocyclopropane-1-carboxylate (ACC) deaminase enzyme that can leave ACC, and thereby lower the level of ethylene in stressed plants. Drought and salinity are the most common environmental stress factors for plants in Iran. Objectives: The main aim of this research was development of bio-fertilizers containing A...
متن کامل